Revised Loss Bounds for the Set Covering Machine and Sample-Compression Loss Bounds for Imbalanced Data

نویسندگان

  • Zakria Hussain
  • François Laviolette
  • Mario Marchand
  • John Shawe-Taylor
  • S. Charles Brubaker
  • Matthew D. Mullin
چکیده

Marchand and Shawe-Taylor (2002) have proposed a loss bound for the set covering machine that has the property to depend on the observed fraction of positive examples and on what the classifier achieves on the positive training examples. We show that this loss bound is incorrect. We then propose a loss bound, valid for any sample-compression learning algorithm (including the set covering machine), that depends on the observed fraction of positive examples and on what the classifier achieves on them. We also compare numerically the loss bound proposed in this paper with the incorrect bound, the original SCM bound and a recently proposed loss bound of Marchand and Sokolova (2005) (which does not depend on the observed fraction of positive examples) and show that the latter loss bounds can be substantially larger than the new bound in the presence of imbalanced misclassifications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparsity in machine learning: theory and practice

The thesis explores sparse machine learning algorithms for supervised (classification and re­ gression) and unsupervised (subspace methods) learning. For classification, we review the set covering machine (SCM) and propose new algorithms th a t directly minimise the SCMs sample compression generalisation error bounds during the training phase. Two of the resulting algo­ rithm s are proved to pr...

متن کامل

Bounds on the outer-independent double Italian domination number

An outer-independent double Italian dominating function (OIDIDF)on a graph $G$ with vertex set $V(G)$ is a function$f:V(G)longrightarrow {0,1,2,3}$ such that if $f(v)in{0,1}$ for a vertex $vin V(G)$ then $sum_{uin N[v]}f(u)geq3$,and the set $ {uin V(G)|f(u)=0}$ is independent. The weight ofan OIDIDF $f$ is the value $w(f)=sum_{vin V(G)}f(v)$. Theminimum weight of an OIDIDF on a graph $G$ is cal...

متن کامل

Estimating a Bounded Normal Mean Relative to Squared Error Loss Function

Let be a random sample from a normal distribution with unknown mean and known variance The usual estimator of the mean, i.e., sample mean is the maximum likelihood estimator which under squared error loss function is minimax and admissible estimator. In many practical situations, is known in advance to lie in an interval, say for some In this case, the maximum likelihood estimator...

متن کامل

Efficiency Evaluation and Ranking DMUs in the Presence of Interval Data with Stochastic Bounds

On account of the existence of uncertainty, DEA occasionally faces the situation of imprecise data, especially when a set of DMUs include missing data, ordinal data, interval data, stochastic data, or fuzzy data. Therefore, how to evaluate the efficiency of a set of DMUs in interval environments is a problem worth studying. In this paper, we discussed the new method for evaluation and ranking i...

متن کامل

Bayesian Two-Sample Prediction with Progressively Type-II Censored Data for Some Lifetime Models

Prediction on the basis of censored data is very important topic in many fields including medical and engineering sciences. In this paper, based on progressive Type-II right censoring scheme, we will discuss Bayesian two-sample prediction. A general form for lifetime model including some well known and useful models such asWeibull and Pareto is considered for obtaining prediction bounds ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2007